-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make URL case insensitive. #1350
Conversation
Now URL's starting with hTTp or Https will be captured by the regular expression.
Case-insensitivity does not affect this regex's behavior w.r.t. ReDoS. Though that's not universally true, for some regexes it would matter. |
Should other regexps also be case-insensitive? I'm thinking of the Maybe a better question is, are there any regexes that need to be case-sensitive? or should we find a way to make all regexes case-insensitive by default? |
@@ -606,7 +606,7 @@ inline.pedantic = merge({}, inline.normal, { | |||
inline.gfm = merge({}, inline.normal, { | |||
escape: edit(inline.escape).replace('])', '~|])').getRegex(), | |||
_extended_email: /[A-Za-z0-9._+-]+(@)[a-zA-Z0-9-_]+(?:\.[a-zA-Z0-9-_]*[a-zA-Z0-9])+(?![-_])/, | |||
url: /^((?:ftp|https?):\/\/|www\.)(?:[a-zA-Z0-9\-]+\.?)+[^\s<]*|^email/, | |||
url: /^((?:ftp|https?):\/\/|www\.)(?:[a-zA-Z0-9\-]+\.?)+[^\s<]*|^email/i, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is case-insensitive, could you also change a-zA-Z
to a-z
?
looks like this could be solved easier by just changing line 618 from inline.gfm.url = edit(inline.gfm.url) to inline.gfm.url = edit(inline.gfm.url, 'i') |
Careful on this PR... See https://url.spec.whatwg.org/#url-writing and the keyword "not" floating around in there for certain schemes. |
we are only matching |
Here's at least one where I am thinking it should be double checked and possibly added to the tests:
Removing the double negative (emphasized font in immediate previous quote) ... it seems like this use case may need case sensitivity:
In this browser the Address bar does take a mix of upper case absolute URLs but then copying it out it always lower cases it... so that's how this browser handles it at least. |
The spec says it "must be one of the following" and ftp/http(s) matches
|
also looks like github doesn't mind if it is not lowercase. |
Doesn't mean that GFM is following output guidelines from the W3 for well written HTML code conformance. The decisions are these:
We'll accommodate regardless if this goes through but filtering will be affected with items 1 and 2. This is probably why item 3 exists is it is a balance perhaps. Hence why I said be careful and haven't bothered to vote prematurely either way. |
This only affects inline url when |
One place this could be addressable is in github/cmark however there's no real unit test, that I know of, since the W3 spec mentions bases and relative urls which is the use case exception. I'm not against this PR just noting what effects will happen with it. Filtering is where we will need to adjust to this forcing it to lowercase most likely (for security integrity). Still need to test this (when I get back to dev station next week) in node with the |
This works just fine: const str = 'hTtP://example.com';
const url = new URL('hTtP://example.com');
console.log('node\t', process.version);
console.log('input\t', str);
console.log('url\t', url.href); |
A little more on the test range: $ node -v && node -e 'var x = new URL("/doc/this.html", "hTtP://example.com"); console.log(x.protocol)'
v10.11.0
http: ... this passes (pre PR and without this dep)... and by the spec it seems like it shouldn't but that's probably a node issue to be raised. Haven't had a chance to test the legacy Expect this one to pass since it uses an absolute url and legacy API (again without the PR and this dep): $ node -v && node -e 'var url = require("url"); var x = url.parse("hTtP://example.com/doc/this.html"); console.log(x.protocol)'
v10.11.0
http: So basically a GH itself i/o page for GH parsing would be helpful when base is specified... would be helpful for a final test... still contemplating if a local projects i/o page could handle this. Will test our sanitizer shortly to see if it's handling it the same way as node... our code is using case insensitive tests so we don't have to change that part... just checking sanitizer to be ultra-safe since there could be catastrophic results with across-the-pond projects and up/down-stream. Pass with filtering on our sanitizer (post PR and with this dep)... albeit this isn't all sanitizers. Doesn't care if it's Point still being more tests are still recommended to be added in case it complies and in case it doesn't. e.g. be careful. :) Apologies for the reedits... attempting to make this more clear for the masses and to keep the noise level down. |
Closing in favor of #1384 |
Now URL's starting with hTTp or Https will be captured by the regular expression.